What’s in a name-less method?
When I first saw the announcements for the new Delphi 2009 features, a little more than a year ago, my reactions went something like this:
Unicode: Hmm… looks interesting.
Generics: YES! FINALLY!
Anonymous methods: …huh?
I think that’s pretty much how everyone reacted to anonymous methods at first. The syntax is kinda ugly, and you could get lost trying to read through a procedure and finding another procedure declared in the middle of it like that. And all this so you could have… drumroll please… a procedure with no name! Ta-da! Umm… yeah. OK, whatever.
Things got a little more interesting, and more clear as to what they were useful for, when people started talking about anonymous methods as closures. Storing local variables inside an anonymous method that retains state and can be passed around. That’s actually kinda cool! But how does it work? Well, let’s find out.
[code lang="delphi"] program Project1; {$APPTYPE CONSOLE} uses SysUtils; type TAnonProc = reference to procedure; procedure doStuff; var counter: integer; i: integer; countProc: TAnonProc; begin counter := 0; countProc := procedure begin inc(counter); end; for i := 1 to 10 do countProc; writeln(counter); end; begin try doStuff; except on E: Exception do Writeln(E.ClassName, ': ', E.Message); end; readln; end. [/code]
When I run this, the result that gets output is 10, showing that the closure has been modifying the actual local variable for the doStuff procedure and not its own private copy. But how does that work? Local variables are stored on the stack, but this closure can be assigned to a reference that can be passed out of the scope of the procedure. Well, let’s look at the assembly code for DoStuff to get a sense of what’s really going on. It may not be pretty, but it’s honest. Nothing can hide from the assembly view.
[code lang="asm"] Project1.dpr.16: begin push ebp mov ebp,esp push $00 push $00 push ebx push esi xor eax,eax push ebp push $0040e8f1 push dword ptr fs:[eax] mov fs:[eax],esp mov dl,$01 mov eax,[$0040e7a8] call TObject.Create mov esi,eax lea eax,[ebp-$08] mov edx,esi test edx,edx jz $0040e891 sub edx,-$08 call @IntfCopy Project1.dpr.17: counter := 0; xor eax,eax mov [esi+$0c],eax Project1.dpr.18: countProc := procedure lea eax,[ebp-$04] mov edx,esi test edx,edx jz $0040e8a7 sub edx,-$10 call @IntfCopy Project1.dpr.22: for i := 1 to 10 do mov ebx,$0000000a Project1.dpr.23: countProc; mov eax,[ebp-$04] mov edx,[eax] call dword ptr [edx+$0c] Project1.dpr.22: for i := 1 to 10 do dec ebx jnz $0040e8b1 Project1.dpr.24: writeln(counter); mov eax,[$00411ccc] mov edx,[esi+$0c] call @Write0Long call @WriteLn call @_IOTest Project1.dpr.25: end; xor eax,eax pop edx pop ecx pop ecx mov fs:[eax],edx push $0040e8f8 lea eax,[ebp-$08] call @IntfClear lea eax,[ebp-$04] call @IntfClear ret jmp @HandleFinally jmp $0040e8e0 pop esi pop ebx pop ecx pop ecx pop ebp ret [/code]
Wow, that’s a lot of code! The really interesting stuff here is what’s going on in the “begin” line. After a bit of standard setup, it creates a try block, then does this:
[code lang="asm"] mov eax,[$0040e7a8] call TObject.Create mov esi,eax lea eax,[ebp-$08] mov edx,esi test edx,edx jz $0040e891 sub edx,-$08 call @IntfCopy [/code]
It places a class pointer in EAX, then calls TObject.Create to set up a new instance of that class, and stores the resulting object pointer in the ESI register. Then it copies ESI to EDX (second parameter in the register calling convention), loads the stack location for our anonymous procedure reference into EAX (first parameter), and calls System._IntfCopy, which is declared as
[code lang="delphi"]procedure _IntfCopy(var Dest: IInterface; const Source: IInterface);[/code]
So now we have an interface reference to an object containing an anonymous method, used to implement the closure. When we go to assign to counter locally, we get this:
[code lang="asm"] Project1.dpr.17: counter := 0; xor eax,eax mov [esi+$0c],eax[/code]
Aha! So it’s not stored on the stack after all; it’s treated more or less as a var parameter referring to a field inside the closure object. It’s referenced as [esi + $0C], so our “local” variable is stored 12 bytes in, which is right where we’d expect to find the first data field on an object that implements two interfaces: IInterface and the “anonymous method interface” for this closure. This explains why an anonymous method can’t capture by-reference parameters or the Result variable, which canonically resides either in EAX or on the stack, depending on your calling convention. The compiler needs to be able to reference it inside the closure object’s heap space, and it can’t do that if the variable’s already living somewhere else.
This raises an interesting question: What happens if you create two anonymous methods that both refer to the same local variable? Where is the variable stored then? I won’t waste space posting another big disassembly, but the answer is, “in the closure object for that procedure.” Only one gets created, with multiple (anonymous) methods and multiple interface references. (Incidentally, this is also how CLOS creates objects: by wrapping multiple closures around the same set of variables.)
There’s more interesting stuff that can be found by poking around in closure objects, but this post is getting long enough, so that’s all for now. I’ll go into more details about anonymous methods and closure objects in a few days.
Looks great; keep it up!
Very interesting indeed!
Good article!
[…] been explained several times by several different people, including me in my last article about anonymous methods, that anonymous methods are implemented through interfaces. And all it takes is a trivial little […]
Sry for being offtopic – what Word Press theme are you using? Looks interesting!!
It’s called Fluid Blue.
[…] And why are they all the same? It has to do with the way anonymous methods work. Take a look at my analysis from a while back and see if it becomes any […]